Tether Unveils Medical AI That Runs on Phones, Outperforms Much Larger SoTA Models, and Can Cut the Cloud Out Entirely

7 May 2026 – Tether’s AI Research Group today launched QVAC MedPsy, a new class of medical language models designed to run directly on smartphones, wearables, and other devices with limited processing power, delivering performance that rivals and, in some cases, surpasses significantly larger models while remaining local and private. Instead of scaling performance through model size, the system focuses on efficiency, reducing both compute requirements and, as a result, removing reliance on remote cloud infrastructure.

Most systems today still depend on large models running on remote servers, requiring sensitive data to move through a cloud server. In healthcare, that includes patient records, diagnostic queries, and clinical notes, all subject to strict privacy and compliance constraints. As the market scales from roughly $36 billion today toward projections exceeding $500 billion by 2033, that architecture is becoming increasingly difficult to justify.

This release challenges one of the most entrenched assumptions in AI, that better performance requires bigger models and more compute. Instead, QVAC MedPsy flips that model. A 1.7 billion parameter model achieved an average score of 62.62 across seven closed-ended medical benchmarks, outperforming Google’s MedGemma-1.5-4B-it by 11.42 points despite being less than half the size. In real-world clinical scenarios like HealthBench Hard, the 1.7 billion model even beats MedGemma 27B, a model nearly sixteen times larger. Our QVAC MedPsy 4 billion parameter version scored 70.54 across the same seven closed-ended benchmarks, exceeding models nearly seven times larger, including MedGemma-27B-text, and delivered higher performance across clinical-style evaluations such as HealthBench Hard, HealthBench, and MedXpertQA. Overall, the evaluation covered eight diverse benchmark suites overall: MedQA-USMLE and MedMCQA for clinical knowledge and medical exams; MMLU Health and MMLU-Pro Health for health literacy; MedXpertQA for expert clinical reasoning; PubMedQA for biomedical research understanding; AfriMedQA for underserved global healthcare contexts; and HealthBench, including HealthBench Hard, for real-world clinical scenarios. The performance gains come from a staged post-training medical process that combines broad medical supervision, higher-value clinical reasoning data, and reinforcement learning focused on harder medical-reasoning cases.

The models also significantly reduce the cost of inference. Our QVAC MedPsy 4B model generates responses in approximately 909 tokens compared to 2,953 tokens for comparable systems, a 3.2x reduction, while the 1.7B model averages around 1,110 tokens versus 1,901 tokens, a 1.7x reduction. That translates into faster response times and the ability to run locally without depending on cloud infrastructure. The models are also being released in quantized GGUF formats for local deployment, with recommended Q4_K_M versions sized at approximately 1.2 GB for QVAC MedPsy-1.7B and 2.6 GB for QVAC MedPsy-4B. In testing, those compressed versions retained most of the benchmark performance while making the models practical for mobile and edge environments.

This shifts where medical AI can actually be used. Systems that previously required external processing can be deployed to support clinicians within on-site systems for secure, local data processing and analysis, on mobile devices, or in environments where connectivity, latency, or privacy constraints make cloud-based models impractical. It also reduces one of the main barriers to adoption in healthcare, the need to move sensitive data outside of controlled environments.

“With QVAC MedPsy, our focus was improving efficiency at the model level, rather than scaling up size,” said Paolo Ardoino, CEO of Tether. “In our tests, the 1.7 billion parameter QVAC MedPsy model outperformed larger systems like MedGemma-4B, and our 4 billion model exceeded results from models nearly seven times its size, while using up to three times fewer tokens per response. That combination matters because it directly reduces compute requirements, latency, and cost. It allows the model to run locally on standard hardware instead of relying on remote infrastructure. In healthcare, that changes the constraints entirely; you can run medical reasoning where the data already exists, inside a hospital system or on a device, without moving sensitive information through the cloud or waiting on external processing.”

For the past decade, progress in AI has been tied to access to cloud-based compute. QVAC MedPsy points to a different direction, where efficiency, locality, and privacy define performance. If those gains hold in real-world deployments, they could reshape the economics of medical AI infrastructure, shifting the advantage toward systems that operate locally with lower cost, lower latency, and greater control over sensitive data.

Read more at https://qvac.tether.io/models/

About Tether Data

Tether Data, S.A. de C.V. (“Tether Data”) is part of Tether’s broader vision to advance freedom, transparency, and innovation through technology. Its mission is to enable people and organizations to connect and share information directly, without unnecessary intermediaries. By creating secure, peer-to-peer systems, Tether Data gives users greater control over their data, communications, and digital interactions. Tether Data aims to redefine how information flows across networks by replacing centralized models with decentralized infrastructure designed for privacy, efficiency, and resilience. The company’s goal is to make global connectivity faster, safer, and more private, empowering individuals and institutions alike to exchange information freely and securely.

About QVAC

QVAC is Tether Data’s advanced AI research initiative dedicated to building open, decentralized, and adaptive intelligence systems. Its mission is Local AI and Infinite Intelligence. It is guided by an uncompromising vision of a world where AI lives and learns on any device, empowering individuals and communities rather than concentrating power in corporate data centers.

latest news